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1.  Introduction 


An  autonomous  unmanned  ground  vehicle  (UGV)  navigates  unstructured  terrain  using  sensors  to 
detect  and  to  classify  its  environs.  Each  sensor  detects  only  a  limited  number  of  attributes  of  the 
world  around  it.  The  UGV  perception  subsystem  must  fuse  the  information  from  multiple 
sensors  so  that  the  traversability  of  terrain  elements  can  be  represented  to  the  planning 
subsystem. 


2.  Registration 


Integration  or  fusion  of  information  from  multiple  sensors  requires  that  the  information  be 
addressable  in  a  common  coordinate  system.  Sensors  of  interest  to  the  UGV  community,  (e.g., 
imaging  ladars  and  color  cameras)  generate  information  about  the  world  in  the  coordinate  system 
of  the  sensor.  Although  it  is  possible  in  principle  to  mechanically  align  the  coordinate  systems,  in 
practice,  precision  alignment  is  not  the  rule.  Fusion  of  information  from  sensors  such  as  these 
requires  the  data  to  be  converted  to  a  common  coordinate  system  in  a  process  known  as 
registration.  In  the  domain  of  research  UGVs,  registration  between  two  sensors  is  typically 
assumed  to  be  time  invariant  and  is  performed  offline  at  the  time  the  sensors  are  installed  on  the 
UGV  and  perhaps  periodically  thereafter.  The  term  “registration”  refers  to  both  the  process  of 
defining  the  mathematical  transformation  between  sensor  coordinate  systems  and  the  mathe¬ 
matical  transformation  itself. 

A  registration  between  a  camera  and  a  three-dimensional  (3-D)  point  cloud  from  a  ladar  sensor  is 
commonly  computed  from  matching  features  in  corresponding  imagery  produced  by  the  two 
sensors  (Elstrom,  Smith,  &  Abidi,  1998).  The  imagery  is  frequently  from  a  scene  purposely 
designed  for  the  registration,  which  provides  features  easily  detected  in  the  imagery  from  either 
sensor.  A  transformation  is  computed  which  consists  of  the  rotation  and  translation  that  projects 
all  3-D  point  features  into  the  corresponding  two-dimensional  (2-D)  image  features,  minimizing 
some  error  term.  This  transformation  is  the  mathematical  transformation  between  the  coordinate 
frames  of  the  two  sensors  and  is  termed  the  registration  transformation  or,  more  commonly,  the 
registration. 

2,1  Sensors 

The  sensors  of  interest  in  this  study  are  an  imaging  ladar  manufactured  by  Schwartz  Electro- 
Optics  (SEO)'  and  a  color  video  camera.  The  color  camera  is  generic,  producing  2-D  color 

'  Design  rights  to  this  unit  are  now  owned  by  General  Dynamics  Robotics  Systems,  Westminster,  Maryland. 
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images  at  640  by  480  pixels  resolution.  An  8-mm  manual  foeus  lens  produces  roughly  a  45- 
degree  field  of  view  (FOV).  Modeling  and  calibration  are  described  in  Oberle  and  Haas  (2004). 

The  SEO  ladar  unit  is  rather  unique  in  today’s  market  in  that  it  generates  a  3-D  image  of  its 
environment  a  number  of  times  per  second  and  so  is  capable  of  use  on  a  moving  vehicle  as  a 
geometric  obstacle  detector  in  an  unstructured  world.  The  32-row  by  180-column  range  image 
consists  of  16  “facets”  that  comprise  4  rows  by  90  columns  of  discrete  elements  called  “rangels” 
(from  “range  pixel”).  The  facets  shown  in  figure  1  tile  the  field  of  view  of  the  range  image,  with 
a  few  rangels  overlapping  between  corresponding  left  and  right  facets.  Each  rangel  represents 
the  range  to  something  in  the  environment  which  has  reflected  enough  energy  to  cross  the 
detection  threshold  of  the  ladar  detector.  Eor  each  rangel,  the  direction  is  known  in  the 
coordinate  frame  of  the  sensor,  so  the  magnitude  and  direction  are  converted  to  the  more  useful 
Cartesian  form.  The  result  is  a  cloud  of  3-D  Cartesian  points  in  the  local  sensor  coordinate 
frame.  The  EOV  of  the  unit  is  86  degrees  horizontal  by  20  degrees  vertical;  its  angular 
resolution  is  0.658  by  0.5  degree.  See  Shneier,  Chang,  Hong,  Cheok,  and  Scott  (2003)  and 
Hong,  Rasmussen,  Chang,  and  Shneier  (2002)  for  other  details. 


2,2  Composite  Registration 

Eor  the  data  set  described  herein,  the  conventional  approach  to  registration  (e.g.,  matching  point 
features)  was  deemed  undesirable  for  the  following  reasons.  First  and  foremost,  point  features 
are  indistinct  in  the  sparse  SEO  imagery.  The  intensity  of  a  rangel  return  was  not  available  from 
this  unit,  only  the  time-of-flight-based  range.  Thus,  there  is  no  reflectance-based  image  in  which 
to  look  for  features  apparent  in  the  camera  image. 

It  is  possible  to  produce  a  ladar-based  image  with  the  use  of  the  two  dimensions  of  the  ladar 
“image  plane”  to  define  an  image  point  and  represent  the  third  dimension  as  a  “false”  color.  A 
well-silhouetted  comer  feature  in  ladar  data  presented  in  this  fashion  can  be  detected.  However, 
the  large  mismatch  in  the  angular  resolution  of  the  two  sensors^  limits  the  ability  to  localize  the 
point  correspondence. 


^One  ladar  image  point  created  in  this  fashion  is  approximately  50  times  the  size  of  a  pixel  from  the  camera 
image. 
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Third,  the  point  features  most  readily  distinguished  in  ladar  imagery  are  well-silhouetted  corners. 
However,  any  edge  in  a  ladar  image  is  subject  to  the  “mixed  pixel”  effect  (Tuley,  Vandapel,  & 
Hebert,  2004).  This  occurs  when  the  ladar  impinges  on  objects  in  the  environment  at  different 
ranges,  resulting  in  returns  from  each.  The  range  reported  is  incorrect  and  is  usually  inter¬ 
mediate  between  the  two  correct  ranges.  An  edge  rangel  is  almost  surely  a  mixed  pixel,  further 
reducing  the  certainty  of  location  of  the  feature. 

A  final  consideration  was  an  interest  in  exploring  a  registration  technique  requiring  less  manual 
feature  matching,  a  task  not  well  suited  to  field  operations  and  maintenance. 

Instead,  an  approach  was  implemented,  based  on  a  variation  of  that  described  in  (Shneier  et  ah, 
2003),  with  elements  of  the  same  sensor  suite  of  the  Shneier  paper.  Sensor  data  from  the  camera 
and  the  SEO  ladar  were  separately  registered  to  high-accuracy,  high-resolution  data  from  a 
Riegk  LMS-210  surveying  ladar  collected  from  the  same  registration  scene.  The  registration 
between  SEO  ladar  and  camera  was  to  be  built  from  the  registrations  of  the  two  sensors  to  the 
Riegl  intermediary.  The  Riegl  was  used  as  an  intermediate  because  the  resolution  is  similar  to 
that  of  the  camera  and  because  the  Riegl  produces  not  only  a  range  at  each  data  point  but  an 
intensity  measure  as  well.  The  intensity  of  the  reflectance  at  each  point  can  be  displayed  in  a 
manner  that  resembles  a  camera  image,  facilitating  matching  of  features  between  the  two 
sensors. 

The  structured  registration  scene  shown  in  figure  2  consists  of  the  concave  intersection  of  two 
walls  and  the  fiat  asphalt-covered  ground.  In  the  foreground  are  several  planar  targets  on 
stanchions,  including  one  target  composed  of  surfaces  and  voids  alternated  in  a  checkerboard¬ 
like  configuration.  The  poster-like  targets  provide  comers  evident  in  both  the  camera  and  Riegl 
reflectance  images.  The  scene  also  contains  the  ground  and  two  intersecting  walls  which  provide 
planar  features  evident  in  data  from  the  SEO  and  Riegl  ladars.  The  registration  between  camera 
and  Riegl  ladar  was  calculated  with  conventional  camera  calibration  methods  that  treated  the  3-D 
feature  locations  from  the  Riegl  sensor  as  a  surrogate  for  ground  tmth  of  the  corner  features.  The 
registration  between  Riegl  and  SEO  ladars  was  approached  in  an  unorthodox  fashion,  which 
comprises  the  focus  of  this  report. 


^Manufactured  by  Riegl,  Horn,  Austria. 
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Figure  2.  Registration  seene  with  red/blue  targets  and  multi-colored 
checkerboard  target. 


3.  Three-Dimensional  to  Three-Dimensional  Registration 


Registration  between  the  coarse  SEO  ladar  and  the  high-resolution  Riegl  surveying  ladar  posed  a 
new  set  of  issues.  The  ICP  (iterative  closest  or  corresponding  point)  algorithm,  currently  the 
dominant  method  for  3-D-to-3-D  registration  (Rusinkiewicz  &  Levoy,  2001),  was  unfamiliar  to 
the  author  at  the  onset  of  the  effort,  and  disappointing  early  experiences  with  ICP  (not  herein 
documented)  argued  for  a  simpler  approach.  Instead,  the  author  implemented  a  method  based  on 
the  computation  of  the  parameters  defining  the  planes  that  correspond  to  the  two  walls  and  the 
ground  in  the  registration  scene  and  determined  the  rotation  and  translation  between  the  two 
coordinate  frames  so  defined.  In  general,  good  descriptions  of  surfaces  can  be  extracted  from 
ladar  point  clouds  because  a  large  number  of  data  points  support  the  small  number  of  parameters 
describing  a  surface.  In  such  an  overdetermined  system,  the  effect  of  a  large  error  in  an 
individual  point  can  be  expected  to  be  overwhelmed  by  the  large  number  of  points  without  large 
error. 

At  the  desired  10-  to  20-meter  range  (the  range  of  greatest  interest  in  the  intended  use),  the 
resolution  of  the  SEO  put  only  30  or  so  points  on  the  0.5-m-square  targets,  with  the  edges  subject 
to  the  “mixed  pixel”  effect.  The  corners  were  not  clearly  evident  in  the  SEO  image.  The  use  of 
planar  data  from  the  targets  was  deemed  un-promising.  Only  the  planar  data  from  the  walls  were 
used  in  the  computation  of  the  registration. 
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3.1  Wall-Based  Method 

The  method  is  to  extraet  from  the  ground  and  eaeh  of  the  two  walls  of  the  registration  seene  the 
equation  deseribing  the  plane  embedded  therein.  The  veetors  deseribing  the  orthogonal  planes 
in  one  ladar’s  image  are  then  rotated  into  the  eorresponding  veetors  in  the  other  ladar’s  image  to 
ealeulate  the  rotation  and  translation  between  the  two  images.  The  proeess  of  extraeting  the 
plane  oecurs  in  several  steps.  First,  the  SEO  point  eloud  is  rotated  180  degrees  about  the  y  axis 
to  resolve  axis-naming  eonventions  between  the  ladars.  The  result  is  an  approximate  alignment 
between  the  two  sensors.  The  pair  of  point  elouds  is  then  proeessed  by  means  of  software 
developed  at  Carnegie  Mellon  University,  deseribed  in  Vandapel,  Donamukkala,  and  Hebert 
(2003),  whieh  separates  points  on  the  ground  from  others  in  the  vegetation  above  the  ground.  A 
“ground”  point  has  no  points  beneath  it  in  the  data  set,  while  a  “vegetation”  (or  “non-ground”) 
point  oecurs  above  other  points.  For  the  flat  terrain  of  the  parking  lot,  the  effect  is  to  classify 
the  walls  and  targets  (imperfectly)  as  vegetation.  The  resulting  “ground”  is  evaluated  manually 
for  outliers  (tops  of  walls  and  posters  erroneously  classified  as  ground)  and  these  points  are 
removed  by  ad  hoc  methods.  The  remaining  points  are  substantially  planar,  and  these  are 
submitted  to  a  least  squares  planar  fit,  with  points  that  deviate  more  than  a  few  inches  from  the 
fit  being  culled.  The  remaining  points  are  well  described  by  the  calculated  plane  and  are  judged 
to  belong  to  the  “ground”  class. 

The  two  walls  are  treated  in  a  similar  fashion.  The  entire  point  cloud  is  rotated  so  that  the  z  axis 
in  the  new  coordinate  frame  is  approximately  perpendicular  to  the  surface  of  one  of  the  walls.  The 
rotated  wall,  which  now  has  the  characteristics  required  of  “ground”  by  the  ground  extraction 
code,  is  isolated  and  extracted.  Parameters  of  the  ground  plane  are  calculated  and  rotated  back  into 
the  original  coordinate  frame.  An  example  of  a  wall  extracted  in  this  fashion  is  depicted  in 
figure  3. 

3.2  Formulation 

At  this  point  in  the  process,  the  parameters  of  each  wall  (right,  left,  and  ground"^)  have  been 
calculated  in  the  form 

Bws  *  Xws  +  bws  ~  0  where 

/ 

the  notation  V  indicates  the  transpose  of  vector  (or  matrix)  V; 

Bws  ~  [bIws  b2ws  3.3 ws],  and  |  3ws  |  =  1; 
al  ws,  32ws,  bSws,  and  bws  are  scalar; 

the  subscript  w  can  assume  the  values  {1,  2,  3}  corresponding  to  {ground,  right  wall,  left  wall}; 

^We  will  hereafter  refer  to  the  ground  as  one  of  the  walls  defining  the  embedded  coordinate  frame.  Hopefully, 
this  will  not  be  confusing,  especially  since  we  treated  the  walls  as  ground  in  the  previous  section. 
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the  subscript  s  can  assume  the  values  {g,  r}  corresponding  to  {SEO  or  Riegl}; 

Xws  is  of  the  form  [x  y  z] , 

and  X,  y,  and  Z  are  the  scalar  coordinates  of  a  data  point  from  the  w  subset  of  the  s  sensor  point 
cloud  in  local  s  sensor  coordinate  frame. 


Figure  3.  The  right-hand  wall  isolated  from  the  SEO  data  is  shown  in 
the  top  image.  (Rectangular  voids  are  occluded  by  posters 
in  the  foreground,  which  are  eliminated  as  “non-ground” 
elements  of  the  data.  The  wall  is  also  shown  in  white  in  the 
lower  image,  rotated  90  degrees  to  plan  view,  along  with 
points  deemed  non-ground,  shown  in  blue.  Planar  targets 
can  be  seen  as  clusters  of  blue  points  in  lower  right  of  this 
image.) 


Let  US  use  these  results  to  determine  the  rotation  and  translation  between  the  coordinate  frames 
of  the  two  sensors. 


First,  we  interpret  the  wall  parameters.  The  unit  vectors  Sis,  R2s,  and  SBs  are  the  perpendicular 
vectors  to  the  three  walls  in  the  coordinate  system  of  sensor  s.  They  form  the  basis  of  a 
coordinate  frame  embedded  in  the  walls  of  the  scene,  as  seen  by  the  sensor  s.  The  origin  of  this 
embedded  coordinate  frame  is  the  intersection  of  the  three  walls.  We  call  the  embedded 
coordinate  frame  “e.”  These  basis  vectors  can  be  organized  into  a  rotation  matrix. 

The  rotation  matrix  As  ,  defined  as 
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^Is 
^2s 

fors  =  {g,r}, 

^35 

rotates  a  point  in  the  embedded  coordinate  frame  to  the  coordinate  frame  of  the  s  sensor. 

The  vector  bs  =  [bis  b2s  bss]'  is  the  origin  of  the  sensor  frame,  measured  in  the  embedded 
frame,  and  so  represents  the  translation  from  the  sensor  coordinate  frame  to  the  embedded 
coordinate  frame.  A  point  in  the  embedded  coordinate  frame  can  be  mapped  to  the 
coordinate  frame  of  sensor  s  by 

Ps  =  As*£e-  (As*bs).  (1) 

Now  we  compute  the  rotation  Rrg  and  translation  trg  to  transform  a  point  in  the  coordinate 
frame  of  the  g  sensor  to  the  coordinate  frame  of  the  r  sensor. 

The  rotation  Rrg  between  the  coordinate  frames  of  the  two  sensors  can  be  calculated  as  the 
rotation  between  the  orthogonal  basis  matrices  Ag  and  Ar, 

Rjg  =  Ar  *  inv(Ag)  . 

The  translation  between  the  sensor  coordinate  frames  is  somewhat  more  complex  because  the 
measured  translations  are  defined  in  the  embedded  frame,  not  in  either  sensor  frame.  The 
translation  between  the  coordinate  frames  of  the  two  sensors  is  expressed  as 

trg  =  (Ar  *  (  bg  -  br  )). 

Intermediate  steps  in  the  development  of  this  expression  are  documented  in  appendix  A. 

4.  Metrics 


4,1  Quantitative 

Although  there  are  several  alternate  ways  of  calculating  the  transformation  between  coordinate 
frames,  there  is  no  definitive  quantitative  measure  of  the  correctness  of  the  transformation. 
Rusinkiewicz  and  Levoy  (2001)  suggest  a  metric  based  on  the  sum  of  squared  distances  (SSD) 
from  a  point  to  the  nearest  surface.  In  this  case,  this  could  be  implemented  as  the  SSD  from  SEO 
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points  to  the  corresponding  Riegl  plane.  This  measure  was  not  implemented  as  part  of  this  study 
but  should  be  considered  for  any  successor  studies. 

4,2  Visual  Alignment 

Qualitative  measures  are  commonly  used  to  assess  the  correctness  of  the  transformation.  One 
measure  is  inspection  of  the  alignment  of  one  point  cloud  transformed  into  the  coordinate  frame 
of  the  other.  The  mechanism  used  in  this  case  is  the  Virtual  Reality  Modeling  Language 
(VRML)  visualization  tool.^  The  SEO  points  were  transformed  to  the  Riegl  frame  with  the 
registration  calculated  as  described  previously  and  were  assigned  a  color  (red).  The  Riegl  points 
were  assigned  a  different  color  (white),  and  both  sets  of  points  were  viewed  as  part  of  the  same 
VRML  view  (see  figure  4).  With  built-in  tools  for  moving  the  viewpoint  around  in  the  viewer, 
gross  misalignment  of  the  two  point  sets  is  apparent.  Minor  misalignments  may  escape  scrutiny, 
however.  One  disadvantage  is  that  the  viewer  has  no  cue  to  differentiate  points  in  the  foreground 
from  points  in  the  background,  so  features  that  cannot  easily  be  silhouetted  are  difficult  to  assess. 
Another  disadvantage  is  that  the  only  depth  cues  come  from  moving  the  viewing  perspective,  so 
documenting  phenomena  for  a  presentation  or  paper  is  difficult.  This  method  was  used  for  initial 
assessment  of  the  quality  of  3-D-to-3-D  registration. 


Figure  4.  SEO  data  (in  red)  juxtaposed  against  Riegl 
data  (in  white)  (plan  view).  (Note  that  the 
SEO  FOV  is  wider  than  Riegl  FOV.) 


4,3  Project  Points  Into  Image 

A  method  for  evaluating  the  quality  of  a  3-D-to-2-D  registration  is  to  project  the  3-D  points  into 
the  corresponding  image^.  A  3-D  point  is  assigned  the  color  of  the  pixel  into  which  it  projects 

am  indebted  to  Dr.  Nicolas  Vandapel  for  introducing  me  to  this  approach, 
am  indebted  to  Dr.  Vandapel  and  Mr.  Ranjith  Unnakrishnan  for  suggesting  this  method. 
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and  is  viewed  in  a  color  VRML  view.  A  high-quality  registration  is  distinguished  by  appropriate 
color  of  the  3-D  features  in  the  view.  This  method  was  used  to  evaluate  the  end-to-end 
registration  between  camera  and  one  or  the  other  of  the  ladar  sensors.  It  works  reasonably  well 
for  high-resolution  point  clouds  such  as  the  Riegl.  The  low-resolution  point  clouds  of  the  SEO 
were  more  difficult  to  assess.  When  the  camera  and  ladar  are  far  apart,  points  in  the  background 
frequently  assume  unexpected  colors,  since  the  sensors  are  occluded  differently  by  features  in  the 
foreground.  Only  features  in  the  foreground  can  be  relied  upon  to  display  the  color  expected. 

An  example  of  a  good  registration  is  shown  in  figure  5,  which  depicts  the  points  from  the  Riegl 
ladar  projected  into  the  color  camera  image. 


Figure  5.  Visualization  of  registration  of  Riegl  sensor  data  to  eolor 

eamera  image  at  lower  left.  (The  eolors  of  the  3-D  foreground 
posters  match  the  image,  as  do  un-occluded  elements  of  the 
background  [the  door,  for  example],  indicating  a  good 
registration.) 


5.  This  Registration 


The  registration  data  were  collected  in  May  2003  at  the  campus  of  the  National  Institute  for 
Standards  and  Technology  in  Gaithersburg,  Maryland. 

The  rotation  between  Riegl  and  SEO  sensors  was  computed  with  the  technique  described  herein. 
The  equations  describing  the  walls,  as  seen  by  the  SEO  sensor,  were 
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Ar  =  bs  = 


0.098932 

0.0089963 

0.99505 

2.2564 

0.83984 

-0.53779 

-0.073874 

12.23 

0.53257 

0.84365 

-0.06791 

15.453 

0.099795 

0.0049212 

0.995 

bs  = 

0.74841 

0.83659 

-0.54593 

-0.045581 

13.288 

0.55443 

0.83099 

-0.045497 

16.101 

Note  that  in  this  form,  the  rows  of  the  matriees  are  in  the  form  of  &ws.  Sinee  these  As  matriees 
are  not  quite  orthogonal,  they  are  brought  into  orthogonality  by  singular  value  deeomposition  (svd). 

Next,  the  rotation  between  the  sensor  coordinate  frames  is  computed.  The  rotation  from  the  SEO 
coordinate  frame  to  the  Riegl  coordinate  frame  is  computed  as 

Rjg  = 

0.99969  -0.017033  0.017899 

0.016979  0.99985  0.0031528 

-0.01795  -0.002848  0.99983 

Finally,  the  translation  from  SEO  to  Riegl  coordinate  frames  is  computed: 
trg  = 

1.0851 

-0.042551 

-1.6228 

The  registration  is  evaluated  first  by  an  inspection  of  the  alignment  juxtaposing  the  registered  data 
sets  in  a  3-D  visualization  (see  figure  6).  The  alignment  of  the  walls  is  substantially  better  than  in 
figure  4.  However,  close  inspection  reveals  that  the  alignment  of  the  posters  is  poor  (figure  7). 

The  registration  is  then  evaluated  by  projection  of  the  SEO  points  into  the  camera  image,  as  was 
done  in  figure  3.  The  results,  shown  in  figure  8,  corroborate  that  the  SEO  sensor  and  camera  are 
not  well  registered.  In  particular,  there  appears  to  be  an  offset  of  approximately  1  foot  in  the 
vertical  direction.  There  is  also  an  apparent  discontinuity  in  the  SEO  data  where  the  right  and 
left  facets  merge.  The  calculated  rotation  between  the  two  sensors  appears  correct. 

Other  anomalies  were  identified  in  the  vicinity  of  the  red  and  blue  targets,  which  were  constructed 
of  retro-reflective  material.  The  SEO  sensor  appears  to  detect  these  targets  at  a  relatively  shorter 
range  than  does  the  Riegl.  The  large  multi-colored  target,  which  is  not  retro-reflective,  does  not 
evidence  this  effect. 
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Figure  6.  SEO  data  (in  aqua)  juxtaposed  against 
Riegl  data  (in  black)  (plan  view). 


Figure  7.  SEO  data  (red)  for  large  poster  juxtaposed 
with  corresponding  Riegl  data  (blue). 


Figure  8.  The  colors  assigned  to  the  posters  when  the  SEO  data  are  projected  into  the  camera  image  (right) 
do  not  align  correctly  with  the  colors  and  the  boundaries  of  the  camera  image  itself  (left), 
indicating  a  poor  registration  between  SEO  sensor  and  camera. 
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6.  Conclusions 


The  wall-based  approaeh  to  aligning  two  imaging  ladar  sensors  is  an  appealing  notion.  Upon 
suffieient  refleetion,  the  geometry  is  intuitive,  and  the  mathematies  to  implement  it  are  straight¬ 
forward. 

The  benefit  of  sueh  an  approaeh  is  that  walls  in  a  mutually  orthogonal  eonfiguration  are  available 
almost  anywhere,  so  in  prineiple,  the  ladar-to-wall  portion  of  a  UGV  sensor  alignment  eould  be 
improvised  in  the  field  with  a  minimum  of  equipment  and  with  relatively  unsophistieated  software. 
The  wall-to-eamera  portion  of  the  registration  is  required  to  eomplete  the  ladar-to-eamera  registra¬ 
tion,  whieh  is  the  end-to-end  result  desired.  Wall-to-eamera  registration  without  manual  feature 
designation  in  sueh  an  improvised  registration  scene  is  left  to  another  study. 

However,  this  study  did  not  result  in  a  successful  registration  of  the  two  sensors  with  this  data 
set.  The  most  likely  source  of  error  is  the  calibration  of  the  SEO  sensor  used  to  convert  the  range 
image  to  a  Cartesian  point  cloud.  The  calibration  used  in  this  study  is  based  on  a  theoretical 
(rather  than  calibrated)  set  of  direction  vectors  mapping  the  range  image  into  a  cloud  of  3-D 
points.  At  least  at  the  center  where  the  two  facets  overlap,  the  calibration  is  suspect.  A  precise 
determination  of  the  source  of  error  is  beyond  the  scope  of  this  study. 

The  visualization  tools  used  to  evaluate  the  quality  of  the  registration  were  inadequate,  especially 
given  the  resolution  of  the  SEO  sensor.  Having  more  than  one  (preferably  at  least  three)  corner 
point  in  the  EOV  would  help  quantify  the  effectiveness  of  the  registration.  It  would  also  be 
helpful  to  increase  the  density  of  the  SEO  scan  by  slewing  the  sensor,  although  it  increases  the 
complexity  of  the  sensor  calibration. 
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Appendix  A.  Intermediate  Steps  in  the  Development  of  Expression  for 
Transformation  Between  Coordinate  Frames 

From  equation  1,  for  some  sensor  s,  a  point  l^e  in  the  embedded  coordinate  frame  can  be  mapped 
to  the  coordinate  frame  of  sensor  s  by 

Ps  =  As  *  pe  -  (As  *  bs  ).  (1) 

Conversely, 

Pe  =  inv(As)  *  ^s  +  bs 
Recall  that  bs  is  a  measure  in  the  embedded  frame  e. 

The  point  in  coordinate  frame  s  can  be  mapped  to  another  coordinate  frame  t  by 

gt  =  At  *  -  (At  *  bt ) 

=  At  *  (inv(As)  *  ps  +  bs )  -  (At  *  bt ) 

=  At  *  inv(As)  *  ps  +  At  *  bs  -  (At  *  b t  ) 

=  Rts  *  ps  +  (At  *  ( bs  “  bt )) 

=  Rts  *  Ps  +  tts 

So  the  translation  tts  from  frame  s  to  frame  t  is 

tts  =  (At  *  ( bs  “  bt )) 
and  the  rotation  Rts  from  frame  s  to  frame  t  is 

Rts  =  At  *  inv(As)  • 
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